The plot I decided to improve, was found under this link https://www.visualcapitalist.com/the-worlds-biggest-startups-top-unicorns-of-2021/. I see that there are a whole lot of information that could be represented there, but were not (for example, author could have analyzed the industries as a whole, not only single start-ups)
Firstly, I have found the data frame on Kaggle, under this link: https://www.kaggle.com/datasets/ramjasmaurya/unicorn-startups
unicorns_data <- read.csv("~/Downloads/unicorns till sep 2022.csv")
library(plotly)
library(dplyr)
Then, I prepared the data:
unicorns_data$Valuation...B. <- gsub("\\$", "", unicorns_data$Valuation...B.)
unicorns_data$Valuation...B. <- gsub(",", "", unicorns_data$Valuation...B.)
unicorns_data$Valuation...B. <- as.numeric(unicorns_data$Valuation...B.)
high_value_startups <- subset(unicorns_data, Valuation...B. > 10)
industry_valuation <- high_value_startups %>%
group_by(Industry) %>%
summarise(TotalValuation = sum(Valuation...B.)) %>%
arrange(desc(TotalValuation))
high_value_startups$Industry <- factor(high_value_startups$Industry,
levels = industry_valuation$Industry)
country_valuation <- high_value_startups %>%
group_by(Country) %>%
summarise(TotalValuation = sum(Valuation...B.)) %>%
arrange(TotalValuation)
high_value_startups$Country <- factor(high_value_startups$Country,
levels = country_valuation$Country)
high_value_startups$Jittered_Industry <- jitter(as.numeric(as.factor(high_value_startups$Industry)))
high_value_startups$Jittered_Country <- jitter(as.numeric(as.factor(high_value_startups$Country)))
Here, I sorted the industries and countries by total valuation, so we could have more accurate representation of global VC-economy.
fig <- plot_ly(high_value_startups,
x = ~Jittered_Industry,
y = ~Jittered_Country,
size = ~Valuation...B.,
color = ~Industry,
text = ~paste(Company, ":", Valuation...B., "B$",", ",Country,", ",City.,"\nInvestors: ",Investors),
hoverinfo = "text",
type = 'scatter',
mode = 'markers',
showlegend=FALSE)
fig <- fig %>% layout(title = "Valuation of Startups by Industry and Country",
xaxis = list(title = "Industry", tickmode = "array", tickvals = jitter(as.numeric(as.factor(unique(high_value_startups$Industry)))), ticktext = unique(high_value_startups$Industry)),
yaxis = list(title = "Country", tickmode = "array", tickvals = jitter(as.numeric(as.factor(unique(high_value_startups$Country)))), ticktext = unique(high_value_startups$Country)))
fig
Final plot. Here, in my opinion I made it more accurate. Because original graph showed us China’s e-commerce companies at the top of the list, and suggested domination of China’s VC field. But in reality we can see that investors in 2022 preferred fintech as main startups’ industry to invest, also we can see the list of VC funds that invested in each startup, and we can make a conclusion, that still US and European global VC funds tend to invest in US or Europe.
This conclusions could not have been made from the original plot.